Current computational systems are heterogeneous by nature, featuring acombination of CPUs and GPUs. As the latter are becoming an establishedplatform for high-performance computing, the focus is shifting towards theseamless programming of these hybrid systems as a whole. The distinct nature ofthe architectural and execution models in place raises several challenges, asthe best hardware configuration is behaviour and workload dependent. In thispaper, we address the execution of compound, multi-kernel, OpenCL computationsin multi-CPU/multi-GPU environments. We address how these computations may beefficiently scheduled onto the target hardware, and how the system may adaptitself to changes in the workload to process and to fluctuations in the CPU'sload. An experimental evaluation attests the performance gains obtained by theconjoined use of the CPU and GPU devices, when compared to GPU-only executions,and also by the use of data-locality optimizations in CPU environments.
展开▼